In social and economic studies many of the collected variables are measuredon a nominal scale, often with a large number of categories. The definition ofcategories is usually not unambiguous and different classification schemesusing either a finer or a coarser grid are possible. Categorisation has animpact when such a variable is included as covariate in a regression model: atoo fine grid will result in imprecise estimates of the corresponding effects,whereas with a too coarse grid important effects will be missed, resulting inbiased effect estimates and poor predictive performance. To achieve automatic grouping of levels with essentially the same effect, weadopt a Bayesian approach and specify the prior on the level effects as alocation mixture of spiky normal components. Fusion of level effects is inducedby a prior on the mixture weights which encourages empty components.Model-based clustering of the effects during MCMC sampling allows tosimultaneously detect categories which have essentially the same effect sizeand identify variables with no effect at all. The properties of this approachare investigated in simulation studies. Finally, the method is applied toanalyse effects of high-dimensional categorical predictors on income inAustria.
展开▼